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IMPROVIN 


MASS STORAGE 


DATA INTEGRITY 


Designers can do a lot to improve the reliability of the data 
being read from Winchester disks and other mass storage 
media. In fact, they should be doing more. 


by Max Roth 


Mass storage performance and reliability are tied to 
data integrity. However, even with error correction 
schemes, many recoverable errors impede through- 
put. Lost data affect performance even more 
dramatically. As 5%” Winchester drive capacities 
and bit and track density increase, the read channel 
must be enhanced to maintain acceptable disk drive 
error rates. 

Primarily, acceptable disk drive error rates are 
determined (somewhat arbitrarily) by disk technol- 
ogy rather than by predefined system require- 
ments. Most 5%” Winchester products support 
error rates of 1 in 10!° bits for recoverable (soft) 
errors. For nonrecoverable (hard) errors, 1 in 10!2 
bits is the norm. By definition, soft errors are 
recoverable by multiple read retries and therefore 
do not necessarily affect data reliability. Excessive 
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soft errors, however, may degrade throughput 
because of multiple read retries. Nonrecoverable 
hard errors occur primarily as a result of defects in 
a disk’s recording surface. 

Data integrity can be maintained two ways. 
First, the system designer can maintain low error 
rates by using error correction code (ECC). ECC 
solutions vary dramatically in terms of redundancy 
required, correction capability, and miscorrection 
errors. Both hard and soft errors are correctable 
with ECC. However, it should not be used to cor- 
rect soft errors where a particular code’s miscorrec- 
tion is greater than the probability of recovery 
from multiple read retries. On the other hand, ECC 
does offer significant data reliability improvement 
for detecting and correcting hard errors. Histori- 
cally, though, few system designers have accepted 
the responsibility of maintaining data integrity. 

Relying on disk drive manufacturers to assure 
low error rates is the second and traditional way to 
maintain data integrity. But many in the computer 
industry believe that disk drive manufacturers are 
unnecessarily carrying the primary burden for 
maintaining disk system data reliability. By pro- 
viding nearly perfect disk media and sophisticated 
read/write (R/W) channels, disk manufacturers 
have maintained very low disk drive error rates. 
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System reliability could easily be maintained if 
the controller manufacturers/designers improve 
their own error correction schemes by using greater 
redundancy. They could also improve reliability by 
selecting ECC codes that increase correction capa- 
bility while minimizing miscorrection. This would 
even be true with a significant increase in raw disk 
drive error rates. In any case, system manufac- 
turers should implement ECC to guard against error 
rate variations among disk drives. 

While system designers continue to rely on disk 
drive manufacturers to provide low error rates, the 
manufacturers must maintain these low error rates 
by using nearly perfect recording surfaces and a 
capable read channel. It is worth noting that data 
separation is an important read channel function. 
This feature is implemented in the disk controller 
for present 544” Winchester disk drives. Because 
the data separator’s performance can contribute to 
data errors, future interface standards are incorpo- 
rating data separation in the disk drive electronics 
rather than in the controller. This places full 
responsibility for data integrity on the disk drive 
manufacturer. 


Causes of data errors 

A data bit from the disk drive must be detected 
within a fixed length of time, known as the decision 
window. The data transfer rate and the encoding 
scheme determine the duration of the decision win- 
dow. In the case of standard 5%” Winchester 


drives, using a 5M-bps data transfer rate and mod- 
ified frequency modulation (MFM) encoding, the 
decision window is 100 ns wide. Any bit undetected 
within this time period is considered an error. 

In the ideal world where there is no interference 
and no noise, all data bits are centered within the 
decision window and no data errors occur. If inter- 
symbol interference (pulse interaction) is con- 
sidered, a bit shift or jitter occurs (no pulses are 
exactly centered but all are within the decision 
window). The amount of bit shift can be deter- 
mined by superimposing adjacent bits. This 
amount is primarily a function of the encoding 
scheme used. 

Noise and jitter intrinsic to the data separator 
also induce bit shift. Noise includes media head 
and electronic noise, overwrite modulation, and 
adjacent track interference caused by misposition 
of the read head in the data track. The result is a bit 
distribution superimposed on the bit shift due to 
pulse interaction. Mathematically, the bit shift 
(J =bit shift jitter) can be expressed as 
Jo = Xj+X2+X3+Xq 
where 
= zero-crossing jitter caused by noise 
intersymbol interference 
data separator window jitter 
X4 = mispositioning jitter 

The nominal value of J is zero, as in the ideal 
case. Deviation from nominal is due to the vari- 
ables X1, X2, X3, and X4. These variables have 
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Just what is ECC? 


by Neal Glover 
Data Systems Technology 


Error correction code (ECC) allows data to be accu- 
rately reconstructed from encoded data that contain 
errors. These codes are used in computer and com- 
munication systems to increase data storage and 
transmission integrity. 

An encoded data record typically contains a data 
segment that is identical to the raw data, and a 
redundant segment that is generated from the raw 
data by a generator polynomial. Dividing the encoded 
data by the generator polynomial is the first step in 
decoding. Data are assumed to be error free only if 
the remainder is zero. Usually, a nonzero remainder 
has enough information to allow the accurate recon- 
struction of the original data, provided that errors 
within the encoded record do not exceed the capability 
of the code being used. 

ECC has guaranteed correction and detection abili- 
ties. Errors that exceed a code’s guaranteed correction 
and detection capacities are subject to miscorrection. 
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Miscorrection probability is a function of record 
length, correction ability, and redundancy. If some 
error types are more probable than others, polynomial 
selection can influence miscorrection probability. — 
Historically, most magnetic disk controllers have 
used single-burst correcting codes. Most of these 
controllers use reread to recover from temporary 
errors, and error correction to recover from hard 
errors. This technique maintains data accuracy with 
codes that use only a modest amount of redundancy. 
Some new controller designs use more redundancy 
to implement more powerful codes, including multiple- 
burst correcting codes. In addition, error-tolerant 
techniques are being used for address marks, sync 
marks, and header information. Multiple-burst cor- 
recting codes and other error-tolerant techniques will 
likely be widely used in future disk controllers due to 
new pushes in disk technology, new defect philoso- 
phies, and the lower cost of large scale integration. 
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Fig 1 A marginal variable frequency oscillator (MvFO) plot 
is useful in determining error rate and margin information. 
The bit shift’s relation to error rates and time is plotted with 
the goal of optimizing window margin (T,,) times. 


mean value n (equal zero) and a symmetrical (with 
respect to n) probability density function (PDF). In 
most cases, the PDF can be approximated by a 
Gaussian distribution. 

If the total bit shift jitter, J, is to satisfy a strin- 
gent requirement, the same holds for variables Xj, 
X2, etc. A practical measure for the total jitter 
would be the variance of this function as derived 
from the variance of the variables. 

Assuming that the input variables are not corre- 
lated and are normally distributed, the PDF for J 
will be a normal distribution. Since a read error 
occurs any time the zero-crossing or bit detection is 
outside the decision window, the probability of 
making an error is the area from a time (Tw) to 
infinity if the PDF of the zero-crossing jitter is 
Gaussian. (See the Bibliography for publications 
that support this assumption.) 

If the decision window is varied, the number of 
errors will also vary. Errors can easily be plotted 
against variable decision window time. If the log of 
errors is plotted against a variable decision win- 
dow, however, the resultant curve is referred to as 
a marginal variable frequency oscillator (MVFO) 
plot (Fig 1). 

MVFO gets its name from the technique of vary- 
ing the data separator window or decision window 
by marginal control of the data separator’s clock 
frequency. Test equipment can plot MVFO curves 
for disk drives. The MVFO plot in Fig 1 shows the 
relative bit shift induced by both pulse interaction 
and noise. 

Error rates are essentially 100% when the deci- 
sion window is reduced to less than the bit shift 
time induced by pulse interaction. The intrinsic 
error rate is the best error rate achieved when using 
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the maximum available decision window. By re- 
ducing the decision window until the acceptable 
error rate is measured, a window margin factor can 
be obtained. Note that any chosen acceptable error 
rate must be higher than the read channel’s intrinsic 
error rate. An MVFO plot can be generated for any 
disk drive. It is also a useful tool in characterizing 
error-rate performance. 

If data rates are fixed, as in present 544” Win- 
chesters, then higher track density achieves 
increased capacity. Thermal expansion between 
servo surface and data surfaces, bearing noise (re- 
ferred to as ‘‘nonservoable’’ mechanical errors), 
and servo tracking errors limit track density. The 
density of recently introduced 54” Winchesters is 
approximately 1000 tpi, as opposed to 300 to 400 tpi 
in low capacity drives. 

Ontrack and offtrack adjacent interference are 
primary contributors to the bit shift noise compo- 
nent. Fig 2 illustrates the ontrack and offtrack 
interference mechanism. Offtrack interference [Fig 
2(a)] is due to signal noise picked up by the R/W 
head from the adjacent track as a result of head-to- 
track misposition. Ontrack interference [Fig 2(b)] 


TRACK n—1 


TRACK n 


TRACK n+1 


OFFTRACK 


(a) 


PREVIOUSLY WRITTEN DATA 
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Fig 2 Two types of interference plague read heads. 
Offtrack interference (a) is caused by the pickup of 
adjacent track data. Ontrack interference (b) is caused by 
previously recorded information that was not precisely 
centered on track. 


Fig 3. Low capacity Winchester read 
circuitry gets by with a minimum of 
components. After amplification, 
filtering, and differentiation, the signal 


is fed to a zero-crossing detector. The 
resultant output corresponds to signal 
peaks. 


is due to noise picked up from previously written 
data on the same track, where the overwrite was 
not precisely centered. The read channel must 
accommodate approximately 12% misposition of 
head to track to maintain an acceptable error rate. 
It is up to the disk drive mechanical and servo 
systems to ensure that head to track misposition is 
maintained at less than 12% of track pitch. 


A read channel rationale 

In simple form, a drive’s read channel amplifies 
the signal emanating from the magnetic read heads. 
However, read channels are becoming increasingly 
complex and vital to data reliability as 544" Win- 
chesters grow in capacity and performance. Today’s 
high capacity drives operate at up to 960 tpi. They 
use much less signal energy and have higher noise 
interference susceptibility. For example, the Vertex 
V100 series generates a signal of 0.4 to 0.8 mV peak 
to peak, versus 2 to 5 mV for a typical low capacity 
drive. 

Obviously, read channel design becomes mor? 
critical in the higher capacity unit. In addition, 
many high capacity drives operate with varied 
encoding schemes, such as run length limited 
(RLL). These schemes are more efficient, but they 
also require greater bandwidth than MFM en- 
coding. Unfortunately, greater bandwidth chan- 
nels are more susceptible to noise. 

Fig 3 illustrates a typical low capacity 5%" Win- 
chester read channel control circuit. The readback 
signal picked up by the head is amplified and then 
differentiated after it is filtered by the low pass 
filter. The zero-crossing detector output corre- 
sponds to the readback signal peaks. In the high 
resolution case, false detection can occur due to a 
droop in the differentiated signal. A time domain 
filter solves the problem by ignoring those pulses. 
However, this solution is sensitive to the encoding 
scheme used. 

To achieve low error rates and allow flexible selec- 
tion of recording codes, several circuit elements 
can be incorporated into modern Winchester hard- 
ware. Signal preamplifiers located on the actuator 
arm, near the R/W heads, improve the signal-to- 
noise ratio by amplifying the signal before additive 
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noise is introduced. Automatic gain control (AGC) 
compensates for variations in head signal ampli- 
tude. These variations are caused by normal varia- 
tions in head flying height and differences in the 
efficiency of media and heads. Signal qualifier cir- 
cuits ensure accurate data detection. 

The signal qualifier circuit couples a peak detec- 
tor with a zero-crossing detector. The peak detec- 
tor accurately determines the position of the peaks 
in the time domain. By correlating those peaks with 
the zero-crossing detector output, the pulses that 
are caused by noise can be rejected. As the fre- 
quency span of the recording code is enlarged, the 
probability of false zero-crossing detection 
becomes higher. 

While the MFM frequency ratio is 1:2, it can be as 
high as 1:4 in some RLL codes. Therefore, a peak 
detector scheme is less code dependent than a time 
domain filter. Fig 4 illustrates the qualifier circuit’s 
effectiveness and shows an actual signal from the 
R/W head after preamplification. The differen- 
tiated signal shows significant deflections near the 
zero crossing. 

If a zero-crossing detector circuit is used without 
signal qualification, the resultant data out are not 
usable. With signal qualification, nondata induced 
zero-crossing pulses are eliminated, leaving usable 
data out. Although these circuit elements are not 
generally found in low capacity 5%” Winchesters, 
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Fig 4 By eliminating false zero-crossing frequency 
components, reliable data are ensured. 


R/W HEADS 


Fig 5 Vertex’s high capacity Winchester R/w control takes an equalized signal output of the AGC amplifier and 
differentiates, filters, and digitizes it. A peak follower circuit processes this signal and, in conjunction with a 


zero-crossing detector, generates read data. 


they are common in high capacity, high perfor- 
mance disk drives such as the IBM 3380, 

The control block diagram of the V100 series read 
channel incorporates the elements described (Fig 5). 
In the V100 read channel, a preamp located close to 
the head itself amplifies the head signal. The ampli- 
fied signal is applied to the read preamp via a bal- 
anced transmission line. After amplification, an 
equalizer modifies the signal spectrum. An AGC 
amplifier holds the signal at a constant amplitude 
to allow for head output variation and amplifier 
tolerances. AGC amp output is differentiated, fil- 
tered, and digitized by the zero-crossing detector. 
Then, after qualification by the peak follower, 
these data are available for reading. 

Sophisticated read channel implementations will 
become increasingly important as 5%” Winchester 
drive capacity and performance increase. Track 
and bit densities will continue to rise with improve- 
ments in magnetic head and media technologies. 
New encoding schemes and higher transfer rates 
will further enhance 5% ” drive performance. How- 
ever, read channel implementation will make or 
break small drive data reliability in the near term, 
by amplifying and accurately differentiating the 
signal from the drive’s R/W heads. The exceptions 
will occur when controller manufacturers take on 
an enlarged responsibility for implementing ECC. 

In the foreseeable future, 544” drives will use 
enhanced equalization along with the ability to 
modify particular frequencies of the readback sig- 
nal spectrum. This capability is essential to vertical 
recording schemes. Adaptive filtering will also be 
implemented using a microprocessor to change the 
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bandwidth of a channel as a function of the head’s 
position on the disk. Finally, enhanced signal pro- 
cessing, using correlation techniques adapted from 
radar technology, will allow accurate processing of 
signals in even worst-case signal-to-noise conditions. 
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